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Abstract 

Background: Pinus pinaster Ait. is a major resin producing species in Spain. Genetic linl<age mapping can facilitate 
marker-assisted selection (MAS) through the identification of Quantitative Trait Loci and selection of allelic variants 
of interest in breeding populations. In this study, we report annotated genetic linkage maps for two individuals 
(CI 4 and CIS) belonging to a breeding program aiming to increase resin production. We use different types of 
DNA markers, including last-generation molecular markers. 

Results: We obtained 13 and 14 linkage groups for CI 4 and CIS maps, respectively. A total of 21 1 and 21 S markers 
were positioned on each map and estimated genome length was between 1,870 and 2,166 cM respectively, which 
represents near 6S% of genome coverage. Comparative mapping with previously developed genetic linkage maps 
for P. pinaster based on about 60 common markers enabled aligning linkage groups to this reference map. The 
comparison of our annotated linkage maps and linkage maps reporting QTL information revealed 11 annotated 
SNPs in candidate genes that co-localized with previously reported QTLs for wood properties and water use 
efficiency. 

Conclusions: This study provides genetic linkage maps from a Spanish population that shows high levels of 
genetic divergence with French populations from which segregating progenies have been previously mapped. 
These genetic maps will be of interest to construct a reliable consensus linkage map for the species. The 
importance of developing functional genetic linkage maps is highlighted, especially when working with breeding 
populations for its future application in MAS for traits of interest. 
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Background 

Maritime pine (Pinus Pinaster Ait.) is one of the most 
important species in the Mediterranean region for its 
ecology and wood productiveness. As other conifers, this 
long lived species dominates different landscapes and 
can withstand severe environmental conditions [1]. Sev- 
eral studies have revealed high levels of phenotypic vari- 
ation [2-4] and genetic diversity [5-7] in maritime pine. 
This species has a fragmented geographic distribution 
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that could be subdivided into different meta-populations 
based on its high level of genetic differentiation [8-10]. 
In the Iberian Peninsula different patterns of local adap- 
tation have been identified [11]. Besides its ecological 
value, maritime pine is also a significant species for its 
economic importance. Particularly, P, pinaster is a major 
resin producing species in the Iberian Peninsula [12]. 
The resin is at the basis of many manufactured products 
such as turpentine, oils, varnishes, sealing wax, plastics 
and others. In 1990s resin tapping was reintroduced in 
Spain after a drastic reduction in 1970s due to the inter- 
national crisis in this sector [13]. Many of the aban- 
doned stands have been tapped again. In particular. 
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natural stands of Central Spain are one of the most 
important resin tapping regions [14]. As resin produc- 
tion shows high heritability [15] a breeding program is 
a useful strategy to improve productiveness [16]. Con- 
sequently several breeding programs have been imple- 
mented for resin production in maritime pine [17-19]. 

Genetic linkage mapping can facilitate marker-assisted 
selection (MAS) as it allows the identification of quanti- 
tative trait Loci (QTL) [20-23]. Furthermore, as genome 
organization is well conserved in conifers, comparative 
mapping is a useful strategy to find homologous chro- 
mosomal segments involved in the genetic control of 
economical and adaptive traits [24,25]. 

Traditional molecular makers, such as proteins, RFLPs 
(Restriction Fragment Length Polymorphisms), RAPDs 
(Random Amplified Polymorphic DNAs), AFLPs (Amp- 
lified Fragment Length Polymorphisms) and nSSRs (nu- 
clear Simple Sequence Repeats) have help to build a first 
generation genetic linkage maps in forest trees [26,27]. 
The use of RAPDs and AFLPs, randomly distributed in 
the genome [28,29], has allowed the construction of gen- 
etic linkage maps from species with large genome sizes 
like conifers [30-32]. An alternative for species with ex- 
tremely large genomes or for populations with low levels 
of polymorphism are SAMPL markers (Selective Ampli- 
fication of Microsatellite Polymorphic Loci). SAMPL 
combines the advantages of AFLPs and microsatellites 
resulting in higher percentage of polymorphic markers 
per assay and higher repeatability between assays [33]. 

In recent years, efforts have focused in sequencing 
genes of interest to build genetic linkage maps with 
direct functional information [34]. Functional genetic 
linkage maps have experienced a revolution with the 
availability of new sets of markers from coding regions 
such as: EST-Ps (Expressed Sequence Tags Polymorph- 
isms), EST-SSRs (EST derived microsatellites) and SNPs 
(Single Nucleotide Polymorphisms) [35-37]. Functional 
genetic linkage maps based on annotated genes allow to 
assess redundant and paralogous EST markers and fur- 
ther improve the quality and utility of genetic maps [38]. 
Specifically, SNPs have several advantages for their use 
as molecular makers because they are very abundant in 
the genome, they show higher stability than SSRs, are 
usually bi-allelic and codominant [39,40]. Moreover, new 
technologies have been developed for high throughput 
detection and genotyping of SNPs reducing the cost of 
assays [41,42]. Thus, highly saturated genetic linkage 
maps can be constructed even for species with large and 
un-sequenced genomes like conifers [21,43-46]. 

As other pines, P. pinaster is a diploid organism char- 
acterized by a large and complex genome with high low- 
copy fraction [47,48]. Particularly, maritime pine has 
2n = 24 chromosomes and its genome size is estimated 
between 51-62 pg/2C [49,50]. Several genetic linkage 



maps have been developed for maritime pine based on 
proteins [51-54], RAPDs [54-57], AFLPs [44,49,54,58], 
SSRs [44,58-60], EST-Ps [44,58,61] and SNPs [44]. Also, 
comparative mapping have been performed with Pinus 
taeda L. [44,61]. None of the genetic linkage maps avail- 
able for P, pinaster has been derived from individuals 
belonging to Spanish populations. These populations 
show high levels of genetic divergence with the French 
populations used to design mapping progenies in previ- 
ous genetic linkage maps [8]. As maritime pine shows a 
fragmented geographic distribution with high levels of 
population genetic structure and variation [6,8] it is im- 
portant to explore the genetic organization in a repre- 
sentative population from the Castilian Plateau (Central 
Spain) and thus better cover the natural distribution of 
the species. 

Thus, the main objective of this work was to con- 
struct saturated genetic linkage maps for P, pinaster 
using controlled crosses between two trees that take 
part in a breeding program for resin production in a nat- 
ural population from Central Spain, as a first step to the 
genetic dissection of this trait. Combining different kind 
of molecular markers we aim to construct a map with 
annotated gene functions and homologous markers 
with previous maps for contributing to the develop- 
ment of a consensus map for the species. A second 
objective was to identif)^ candidate genes overlapping 
with QTL already detected in this species [62-64]. 

Methods 

Mapping populations 

Two outbred full-sibs families of P, pinaster were used 
for genetic linkage mapping. Progenies were originated 
from two reciprocal controlled crosses between two pro- 
genitors (C14 and C15) belonging to a natural popula- 
tion in Coca (Segovia) located in Central Spain (41° 12' 
N 4° 31' W). Previous studies on this population have 
showed a differential genetic structure when compared 
with other populations of the natural distribution of the 
species [8]. Progenitors took part in a breeding program 
for resin production started in 1994 and they were 
selected for their contrasting resin production, low for 
C14 and higher for CI 5. Controlled crosses were carried 
out in 1999 for C14xC15 and in 2000 for C15xC14. Fi 
seeds were collected and germinated in controlled con- 
ditions at Instituto Nacional de Investigacion y Tecnolo- 
gia Agraria y Alimentaria, INI A (Madrid, Spain). Then 
they were planted in semi-controlled conditions at 
Direccion Nacional de Biodiversidad- Madrid (40° 27' N 
3° 44' W). A paternity test analysis was performed with 
13 SSRs. Finally, once the contaminants were removed, 
the mapping population comprised 161 individuals: 106 
from family C14xC15 and 55 individuals from C15xC14. 
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Molecular markers 

Genomic DNA was extracted from needles using a 
modified protocol from Dellaporta et al [65] for all mar- 
ker analyses, but for the 1536 and 384 GoldenGate 
assays (Illumina Inc., San Diego, CA, USA), for which a 
commercial Invisorb DNA plants HTS 96kit (Invitek 
GmbH, Berlin, Germany) was used. Four types of mo- 
lecular marker were used for genotyping the mapping 
populations: nSSRs, EST-Ps, SAMPLs and SNPs. 

nSSRs: Forty seven primer pairs designed for amplifi- 
cation of nSSR loci in P, pinaster and P, taeda [60,66,67] 
were tested for segregation in the mapping populations. 
Thirteen loci were polymorphic, 27 were monomorphic, 
and seven resulted in muti-banding or non-clear pat- 
terns. Amplification of A6F03, A5B01, A5A11, A6F10, 
A6D04, A5B07 loci was performed as in Guevara et al. 
[66]. Amplification of NZPR823, NZPR413,NZPR114, 
NZPR544, SsrPt_ctg64, SsrPt_ctg275 loci was performed 
as described by Chagne et al. [60] and the amplification 
of PtTX3116 followed the protocol described by 
Auckland et al. [68] with modified touchdown profile, 
using 55°C and 45°C as starting and final temperatures 
[69]. A Perkin-Elmer GenAmp 9700 thermal cycler (Per- 
kin Elmer Inc., Waltham, Massachusetts, USA) was used 
to carry out PGR reactions. Amplified products were 
separated in denaturing gels containing 6% acrylamide / 
bisacrylamide (19:3), 7 M urea and Ix TBE. Amplified 
products were visualized in a DNA Analyzer System 
(4300, LI-COR Biosciences, Lincoln, NE, USA). Frag- 
ments were scored visually as codominant markers. 

EST-Ps: EST-P genotyping was carried out by Tilling 
(Targeting Induced Local Lesions in Genomes) 
as described by Till et al. [70]. This technique al- 
lows detection of multiple SNP sites heterozygous 
in the same progenitor [71]. A set of 14 EST-P 
primer pairs (PtIFG_893, PtIFG_9136, PtIFG_9034, 
PtIFG_1955, PtIFG_8429, PtIFG_8702, PtIFG_3C8E, 
PtIFG_22B8, PtIFG_lCA6C, PtIFG_9044, PtIFG_2253, 
PtIFG_8436, PtIFG_8887, PtIFG_C6Hll) derived from 
cDNA sequences of P, taeda and P, pinaster [61,72,73] 
were tested, in order to identify the most informative 
markers. A total of 11 EST-P primer pairs generated 
25 polymorphic markers. PCRs were performed in 
10 \A containing 10 ng of DNA; Ix PGR reaction 
buffer (Fermentas, Ontario, Canada), 0.2 mM of each 
dNTP, 2 mM MgS04, 0.25U Pfu DNA polymerase 
(Fermentas, Ontario, Canada), 0.2 (iM of each primer 
(forward primers were labeled on its 5' end with 
IRDye 700 and reverse primers with IRDye 800). 
A Perkin-Elmer GenAmp 9700 thermal cycler 
(Perkin Elmer Inc., Waltham, Massachusetts, USA) was 
used to carry out PGR reactions. Thermocycler para- 
meters were: 94°C 2 min, 10 touchdown cycles of 94°C 
20s, (Tm + 3)°C, 45 s (-0.8°C/cycle), 72°C 1 min; 



45 cycles of 94°C 20s, (Tm-5)°C 45 s, 72°C 1 min and 
final extension step of 72°C for 7 min. Amplification 
products were visualized on 1% agarose gels to verif)^ 
amplification. PGR products were digested with GEL I 
nuclease purified as described by Till et al. [8]. Previ- 
ously, the concentration of nuclease added, was screened 
to optimize the detection of heteroduplex between het- 
erozygous sites. Partial DNA digestions were stopped by 
the addition of 5 \A of 0.5 M EDTA. The mixture were 
transferred to 96-well Sephadex G50 spin plates (GE 
HeathGare, Waukesha, WI, USA) for cleaning up by 
centrifugation into formamide solution and heated at 70°G 
to reduce the volume to 8 (il. DNA fragments were sepa- 
rated in denaturing gels containing 8% Long Ranger poly- 
acrylamide (Cambrex, East Rutherford, NJ, USA), 7 M 
urea and Ix TBE. Fragments detection was carried out on 
a DNA Analyzer System (4300, LI-GOR Biosciences, Lin- 
coln, NE, USA). Fragments were scored as dominant mar- 
kers. Polymorphism was inferred from the resulting 
fragment pattern and confirmed by sequencing independ- 
ently undigested amplified products from four haploid 
megagametophyte DNAs for each progenitor. 

SAMPLs: SAMPL genotyping was performed as indi- 
cated by Vos et al. [28] with several modifications [74]. 
Preamplifications were carried out using three primer 
combinations (£coRI + A/ Msel + G; £coRI + A/ Msel + 
G; £coRI + A/ Msel + T). For the selective amplification 
a SAMPL primer [GATA: (GA)8(TA)2; GATA: (GA)8(TA)2 
[75]], was used in combination with an £coRI + 3 primer. 
In order to select the most informative combinations 
(those with a higher level of polymorphism) different com- 
binations were tested using template DNA from the par- 
ental lines and 9 offspring. Progenitor C14 revealed lower 
levels of polymorphisms than G15 (see Results section), 
thus primer combinations were chosen in order to equili- 
brate the number of markers segregating from each pro- 
genitor. A total of 31 GATA/£coRI and 26 GATA/£coRI 
primer combinations were used for the selective amplifica- 
tion. Selective PGR reaction were performed in 10 \A of Ix 
PGR Buffer (10 mM Tris-HGl, 50 mM KGl, pH 8.3), 
0.1 mM of each dNTP, 2.5 mM MgGb (Roche, Basel, 
Switzerland), 3 ng IRDye 800 5 end labeled GATA or 
GATA primers, 15 ng £coRI + 3 primer, 0.2U Taq DNA 
polymerase (Invitrogen, Grand Island, NY, USA) and 5 \A 
of 10-fold diluted pre-amplification DNA fragments using 
classical AFLP cycling parameters [12]. Samples were 
loaded into denaturing gels containing 8% Long Ranger 
polyacrylamide (Gambrex, East Rutherford, NJ, USA), 7 M 
urea and Ix TBE. Fragments detection was carried out on 
a DNA Analyzer System (4300, LI-GOR Biosciences, Lin- 
coln, NE, USA). Fragments were scored visually as domin- 
ant markers. 

SNPs: two SNP genotyping assays were used in this 
study; a 1,536 BeadArray™ and a 384 BeadXpress® 
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Golden Gate assays (Illumina Inc., San Diego, CA, USA). 
SNPs selected for 1,536 Golden Gate assay corresponded 
to three different sets (see Chancerel et al. [44] for fur- 
ther details): in vitro polymorphisms from 35 candidate 
genes for cell wall formation and drought stress resist- 
ance; in silico SNPs from a maritime pine EST assembly; 
and in silico polymorphism from re-sequenced ampli- 
cons of the species. In this genotyping assay, 95 DNA 
samples of the mapping progenies were genotyped (73 
for C14xC15 and 22 for C15xC14). In order to increase 
the number of genotyped individuals for a set of genes 
of interest, another genotyping assay was developed. 
This genotyping assay (384 SNPlex) consisted in a sub- 
sample of SNPs selected from the 1,536 genotyping assay 
and 14 additional SNPs from candidate genes for 
drought resistance [76]. It was carried out at Center for 
Genomic Regulation (CRG, Barcelona, Spain) for a total 
of 119 DNA samples (79 for C14xl5 and 40 for 
C15xC14). Both genotyping assays were realized accord- 
ing to the manufacturer s instructions (Illumina Inc., San 
Diego, CA, USA) and SNPs clusters revised manually 
with Illumina Bead Studio v2.0 Software. When the 
same SNP was successfully genotyped in both assays pri- 
ority was given for the 384 Vera Code data because of 
the higher number of DNA samples genotyped in this 
assay. Contig and gene sequences containing the poly- 
morphic SNPs are presented in Additional file 1. 

Linkage map construction 

For each progenitor we assembled three different linkage 
maps belonging to datasets of C14xC15 (106 indivi- 
duals), C15xC14 (55 individuals) and a dataset with the 
information of the individuals of both reciprocal crosses 
(161 individuals). Since no relevant differences were 
found as a consequence of merging both progenies (see 
Results section), further linkage analyses were developed 
using only the data set with the merged information of 
both progenies. Parental maps were constructed using 
the "two-way-pseudo-testcross" mapping strategy [77]. 
Markers with more than 70% of missing data were 
excluded from further analysis. Linkage analyses and 
map estimations were performed using the regression 
mapping algorithm implemented in the software Join- 
Map v4.0 [78] with the CP population type and using a 
recombination fraction < 0.35 and a LOD > 3 as map- 
ping parameters. Map distances were calculated using 
Kosambi mapping function [79]. When difficulties in es- 
timating marker order are found, two additional maps 
are constructed (map2 and map3). In map2, new mar- 
kers are added because more pairwise data are available. 
In map3, the remaining loci are added by decreasing 
statistical support. In these cases we kept map2 for fur- 
ther analyses. When a pair of markers was considered 
identical, only one of the markers was selected for 



mapping. In order to assign unlinked loci to selected 
linkage groups (LG), the strongest cross link was 
employed with a LOD value of 3 (JoinMap command 
"assign ungrouped loci to SCL-groups"). Segregation 
ratios were tested using test (P < 0.01). 

Evaluation of homogeneity of recombination rate 
between female and male meiosis 

In order to evaluate whether the male and female 
gametes presented different levels of recombination, we 
tested departure from homogeneity of recombination 
fraction following Plomion et al. [57]. Since the statis- 
tical power of homogeneity depends largely on the sam- 
ple size, the test was performed for all markers pairs in 
common in the three genetic maps for each progenitor 
and having a recombination fraction lower than 0.1 
(Additional file 2). 

Comparative mapping 

The linkage maps of both progenitors were compared 
based on common markers. Besides genetic maps were 
compared with previously developed P. pinaster maps 
[44] based on common SSRs, ESTPs and SNPs. LGs 
were named according to Chancerel et al. [44] using 
loblolly pine nomenclature, as it is the reference pine 
species. 

Genome length and map coverage 

Total genome length was calculated as the sum of all 
mapped marker intervals. Estimated genome length (GJ, 
was determined from the partial linkage data according 
to Hulbert et al. [80] modified by Chakravarti et al. [81] 
(Method 3). A minimum LOD score of three was chosen 
to estimate genome length using framework maps con- 
structed following the methodology previously described 
in order to avoid overestimation of genome size because 
of clustered markers. Observed map coverage was calcu- 
lated as the ratio of total genome length to estimated 
genome length [82]. 

Marker distribution 

To evaluate whether markers were randomly distributed, 
we tested the procedure explained in Echt et al. [38]. A 
Kolmogorov-Smirnov test for two populations was 
implemented to compare the observed marker distribu- 
tion frequencies with expected distribution frequencies 
under the assumption of randomness. SAMPLs and 
SNPs distribution were also analyzed by calculating 
Pearson correlation coefficient between the number of 
SAMPLs and SNPs in the LGs and the size of the LGs 
as in Cervera et al. [82]. 
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Heterozygosity levels 

The average heterozygosity was estimated for each 
progenitor and for each molecular marker type inde- 
pendently. Heterozygosity levels based on SSRs were 
calculated as the ratio between polymorphic and total 
number of tested SSRs, discarding those with multi- 
banding and non-clear patterns. Three SSR primer 
pairs resulted in the amplification of two different loci 
with clearly different segregation patterns and were 
scored as different markers, but they were considered 
as only one for heterozygosity estimations. Heterozy- 
gosity levels based on SNPs were calculated as the 
ratio of polymorphic SNPs and total number of SNPs 
successfully genotyped. Heterozygosity estimates for 
SAMPLs were calculated for the first primer combin- 
ation tested; since the following ones were selected in 
order to maximize the number of polymorphic mar- 
kers in C14 (see Molecular markers subsection). Het- 
erozygosity based on EST-Ps was not calculated 
because we only analyzed markers that had been found 
polymorphic in previous studies in other pine species 
[61,72,73], therefore a bias could be introduced. 

Functional annotation 

Functional annotation of the mapped SNP-based genes 
was carried out using sequence information from the 
Oligo Pool Assay-OPA (60 nucleotides in length at both 
sides of the SNP position). In order to obtain homology 
with longer sequences a BLAST-N search was performed 
using the pine Gene Index [83] and GeneBank [84]. We 
retained sequences showing the highest homology 
(e-value lower than 10"^° were considered significant). 
Then, these longer sequences were annotated using Blas- 
t2GO software [85]. Gene Ontology (GO) annotation 
terms for molecular function at ontology level equal to 3 
were placed in the map in order to search for clusters of 
genes with similar function. For sequences where GO 
annotation for level 3 was not available we selected the 
GO annotation terms for level 2. To evaluate whether 
similar GO terms were clustered or randomly distribu- 
ted along the genome we performed for each GO term 
at level 2 a Kolmogorov-Smirnov test for two popula- 
tions as explained in Marker Distribution subsection. 

In order to detect interesting co-localizations be- 
tween candidate genes and QTLs the linkage maps 
developed in this study were aligned with maps previ- 
ously constructed for P, pinaster containing enough 
number of orthologous markers to detect homologous 
LGs and the respective position of QTLs for different 
traits [44,61,86]. 

Marker nomenclature 

Marker nomenclature for SSRs and SNPs were main- 
tained according to their original publications (see 



Molecular Markers subsection). EST-Ps also conserved 
original nomenclature, but the size of the amplified 
band was added to the marker name. SAMPLs were 
named with the differential selective nucleotide used in 
the preamplification (C, G or T), followed by the tar- 
geted microsatellite (CATA or GATA), and the selective 
£coRI + 3 primer employed, ending by the size of the 
amplified band fragment. 

Results and discussion 

The paternity test analysis revealed seven contaminants 
for C14xC15 and three for C15xC14 that were removed 
for further analyses. The final number of individuals per 
progeny, 106 for C14xC15 and 55 for C15xC14, was in 
the limit for reliable estimations and as we did not ob- 
serve significant differences in recombination fraction 
between female and male meiosis (see next subsection) 
we constructed the genetic linkage maps by pooling all 
individuals of both reciprocal crosses. 

Evaluation of homogeneity of recombination between 
female and male meiosis 

Ninety-six marker pairs for C14 and 42 for CI 5, with a 
recombination fraction below 0.1, were available in all 
three maps (C14xC15, C15xC14 and pool map) (see 
Methods section). Eight marker pairs out of the 96 ana- 
lyzed, showed significant differences between female and 
male meiosis for C14 (data not shown). Five of them 
showed a higher recombination rate for male meiosis 
and three for female meiosis. No marker pair resulted in 
significant differences in recombination rate for C15. 
The low level of differences detected in recombination 
fraction between female and male meiosis supports the 
merging of both progenies in order to obtain a higher 
number of offspring in the mapping population and 
thereby establishes more precise parental maps. No evi- 
dence of heterogeneity of recombination was previously 
reported for P, pinaster [56] and other conifer species 
[87]. However, Plomion and O'Malley [57] suggested 
that recombination fraction could be higher in male 
meiosis for P. pinaster. The important differences in 
number of individuals between our mapping progenies 
(106 versus 55) compelled us to perform the analyses 
with a narrow window of markers (only those with a re- 
combination fraction lower than 0.1), while in Plomion 
and O'Malley [57] analyses were performed with a wider 
window (markers pairs with a recombination fraction 
lower than 0.3). This could explain the difference in 
results obtained. Nevertheless, further research in testing 
homogeneity of recombination between female and male 
meiosis is needed to clarify whether or not female and 
male gametes exhibit similar recombination rate, which 
can have some implications for MAS. 
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Individual linkage maps and comparative mapping 

Previous analysis of the mapping population with 
four AFLP primer combinations revealed very low 
levels of polymorphism (data not shown). Therefore, 
we decided to use SAMPL technique to increase the 
number of polymorphic fragments. SAMPL analysis was 
performed using the most informative primer combina- 
tions (see Methods section). This result validates the use 
of SAMPLs as an alternative for genotyping low poly- 
morphic populations. 

Out of the total set of molecular markers available 
(Table 1), four and five markers were excluded from C14 
and CIS datasets respectively, because of their identical 
segregation profiles with other markers. All of them 
were SNPs belonging to the same gene or contig. Mar- 
kers with more than 70% of missing data were also 
excluded. Most of them were SAMPLs genotyped only 
in the C15xC14 pedigree. In addition, in a "two-way- 
pseudo-test-cross" mapping strategy, intercross markers 
i.e. markers with the same heterozygous allelic configur- 
ation in both progenitors, are less informative. Because 
of that, several SAMPLs, SNPs and one microsatellite 
marker were excluded. However, when it was possible 
we kept a number of intercross markers because they 
allow to align homologous LGs between both parental 
maps. 

Near 5% of markers used for linkage analysis were 
unlinked, which is in the same range of what has been 
observed in other conifer maps [24,45]. Most of them 
presented more than 35% of missing data and corre- 
sponded to SAMPLs genotyped only in the C14xC15 
pedigree. Several SNPs were also unlinked. Near 94% of 
the markers could be assigned to LGs and 60% could be 
positioned in the final maps (Table 1, Figure 1, Figure 2 
and Additional file 3). The lower percentage of markers 
positioned when compared with other highly saturated 
maps [46] is due to the use of SAMPLs only scoring in 
one of the mapping progenies, as revealed by the low 
percentage of positioned SAMPLs (Table 1). When we 
discard SAMPLs scored only in one or the two mapping 
progenies, the percentage of SAMPL markers positioned 
in the parental maps increases to 65.3% for C14 and 
72.1% for C15. These results are very similar to those 
obtained with positioned SNPs (Table 1) indicating that 
both type of markers are suitable for the construction of 
linkage maps. Even more, for a complete coverage of the 
genome it is interesting to use markers with different 
target sequences, since coding and non-coding regions 
seems not to be randomly distributed along the genome 
[87,88]. 

In a first phase, before aligning on the reference 
P. pinaster linkage map, 22 LGs were obtained for 
C14 and 20 for C15 (Table 1, Figure 1 and Figure 2). 
The smallest LGs were similar in size between C14 



and CI 5 maps. However, the largest LG were higher 
in CI 5 than in C14. Also, average size of LGs was 
slightly higher for CI 5 than for C14. This was 
explained because average distance and maximum dis- 
tance between two adjacent markers was larger in CI 5 
than in C14. Thirty-one intercross markers between 
both parental maps allowed the identification of hom- 
ologous LGs (Table 2). Eleven markers with segrega- 
tion 1:2:1 (same heterozygous combination in both 
parents) could only be positioned in one parental map 
(Table 2). Five of them could not be mapped in the 
other parent because they were ungrouped and the 
remaining six markers because of the increase in the 
goodness of fit calculated for the order of markers 
when were included in the map. 

The alignment with maps described by Chancerel 
et al. [44], based on common SSRs, EST-Ps and SNPs 
(Table 2), made it possible to bring together some LGs 
resulting into 13 LGs for C14 and 14 LGs for CI 5 
(Table 1), close to the 12 chromosomes of the haploid 
P, pinaster genome [89]. In general, similar size of LGs 
for parental maps was obtained except for LGs 1, 3 
and 10 that were larger in CI 5 map and LG12 that 
was larger in C14 map. The fact that we could not as- 
semble the markers in 12 LGs, the differences in size 
of homologous LGs and the presence of common mar- 
kers only positioned in one parental map, are probably 
related with the presence of homozygous regions in the 
genome of these individuals that prevent mapping mar- 
kers in these areas. This effect was partially expected 
because previous studies of the population of origin of 
both parental trees. Coca, revealed a high coefficient of 
endogamy [90]. As a result of endogamy we would 
expect a loss of polymorphisms in the individuals 
coming from this population, strictly confirmed by the 
low levels of polymorphism detected by AFLPs geno- 
typing (data not shown) and the low levels of heterozy- 
gosity found in the parental trees (Table 3) compared 
with observed heterozygosity in other provenances of 
P, pinaster [5]. 

In this respect, it is important to point out to the 
difference in heterozygosity between C14 and C15 par- 
ental trees as revealed by the estimation obtained from 
SAMPLs (Table 3). This difference was overcome by 
further genotyping using selected SAMPL primer com- 
binations with a higher number of polymorphic mar- 
kers in C14 (see Methods section). Percentage of 
heterozygosity calculated from SSRs and SNPs yielded 
lower values than those obtained from SAMPLs and 
differences in heterozygosity between C14 and C15 
could not be appreciated. One possible explanation is 
that analyzed SNPs were selected from coding regions 
where the level of polymorphism is lower than in non 
coding regions [91]. 
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Table 1 Mapping parameters of parental linkage maps 
constructed by merging two reciprocal crosses: C14xC15 
and C15xC14 



Table 1 Mapping parameters of parental linkage maps 
constructed by merging two reciprocal crosses: C14xC15 
and C15xC14 (Continued) 



Mapping parameter 


C14 


CI 5 


Total number of available markers 


402 


410 


Number of SSRs loci 


11 


13 


Number of ESTP loci ' 


13 


12 


Number of SAMPL loci 


228 


237 


Number of SNP loci ^ 


150 


148 


Total number of distorted 
(p<0.01) markers 


39 


52 


Number of excluded markers ^ 


62 


72 


Number of SSRs loci 


1 


1 


Number of ESTP loci 


0 


0 


Number of SAMPL loci 


46 


60 


Number of SNP loci 


15 


11 


Number of markers not excluded 


340 


338 


Number of assigned markers ^ 


321 


319 


Number of SSRs loci 


9 


10 


Number of ESTP loci 


11 


11 


Number of SAMPL loci 


174 


166 


Number of SNP loci 


127 


132 


Niimhpr nf nn<^itinnpH rn;^rl<'prc; ^ 


?1 S ffi^ 7%) 


Z. 1 1 \\JZ..^ /UJ 


Number of SSR loci 


6 (60%) 


7 (58.3%) 


Number of ESTP loci 


10 (76.9%) 


7 (58.3%) 


Number of SAMPL loci 


98 (53.8%) 


98 (55.4%) 


Number of SNP loci 


101 (74.8%) 


99 (72.3%) 


InUIIIUcI OI (JIbLOlLcU ^ U.U 

positioned markers 


1 A 
1 ^ 


1 A 
1 ^ 


Unlinked markers (%) ^ 




1 y iJ.DTOj 


Number of LG >3 before 
making alignments 


22 


20 


Number of LG >3 after 
making alignments 


13 


14 


Smallest LG (cM) before 
making alignments 


17.5 


13.4 


Largest LG (cM) before 

iTldKlliy dliy 1 II 1 Icl 1 Lb 


81.1 


155.3 


Average length (cM) LG ± SD 
before alignments 


53.7 ±20.6 


69 ±35.6 


Smallest LG (cM) after 
making alignments 


52 


42.3 


Largest LG (cM) after 
making alignments 


142.2 


155.3 


Average length (cM) of a LG ± SD after 
alignments 


90.8 ±29.1 4 


98.5 ± 38 


Maximum distance (cM) between 2 
adjacent markers 


24.9 


35.3 


Average distance (cM) between 2 adjacent 


6.1 2 ±5.8 


7.22 ±6.4 



Observed map length (cM) 
Estimated map length (cM) 
Observed map coverage 



1180.4 
1870.2 
63% 



1379.5 
2166.6 

64% 



markers ± SD 



^ The 25 ESTP-s correspond to 1 1 gene loci. 

^ The SNPs markers correspond to 47 gene loci and 143 contigs. 

^ Markers with more than 70% of missing data (see Mettiods section) and 

identical markers. 

^ Assigned markers correspond to markers linked with more than 2 other 
markers. 

^ Unpositioned markers correspond to markers with a recombination 
frequency higher than 0.35 with the nearest linked marker (unlinked markers) 
or markers which position could not be reliably estimated. Percentage of 
positioned markers was calculated over the number of not excluded markers. 
^ Percentage of unlinked markers was calculated over the number of not 
excluded markers. 
SD Standard deviation. 



Twenty four contigs with several SNPs (from two to 
seven) were included in the linkage maps. SNPs belong- 
ing to the same gene or contig mapped always in the 
same position or less than 3 cM away (Figure 1 and 
Figure 2), except m682 and ml27 (LG 5), separated by 
26.8 cM. Marker m682 was distorted at the 0.1% sig- 
nificance level, which could affect the accurateness of 
its position. Alternatively, both SNPs may be associated 
to different loci at the same LG. The fact that nearly all 
SNPs belonging to the same contig were mapped in the 
same position supports the accuracy of the genotyping 
method used, as previously reported [41,44]. 

Alignments with the linkage maps developed by Chan- 
cerel et al. [44] pointed out that marker order was highly 
conserved, excepting small inversions of less than 5 cM 
(data not shown). The only major inconsistency in data 
was found for marker PtIFG_8436_200, which was amp- 
lified using the same primer combination as in Chan- 
cerel et al. [44], but subjected to different detection 
techniques, tilling versus SSCP (Figure 2). This EST-P 
marker was mapped in LG 10 in our mapping progeny 
in agreement with previous developed maps in P. taeda 
[72]. However, in other published linkage maps of P. pin- 
aster this gene was mapped in LG 7 [44,61]. Chagne 
et al. [61] discussed the possibility that PtIFG_8436 in P. 
pinaster targeted a paralogous gene as they found low 
similarity at the DNA sequence level. Our result sug- 
gests the existence of an orthologous sequence between 
P. pinaster and P. taeda genomes for the region ampli- 
fied by PtIFG_8436 marker in LG 10 and a paralogous 
sequence in LG 7 of P. pinaster genome. 

Segregation distortion 

A test (d.f. = 1) was performed to test Mendelian seg- 
regation of each marker. We detected 9.7% of markers 
showing distorted segregation ratios at 1% significance 
level for C14 and 12% for CIS linkage maps (Table 1). 
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Figure 1 Genetic linlcage maps: LGs 1 to 6. Bars on the left represent the LGs obtained for CI 4 and the bars on the right the LGs obtained for 
CI 5. Common marl<ers between both maps are in bold and connected with a solid line. Markers in italics are in common with maps of Chancerel 
et al. [44] and the homologous LG in this study is indicated with brackets. Markers showing any special feature (see Results section) are 
underlined. Markers in color are candidate genes that co-localize with QTLs reported in previously published maps for wood properties (green), 
isotopic composition of C^^ (violet) and ring growth (blue). SNPs belonging to the same contig are surrounded by a solid line and when they 
were too far from each other they are connected by a solid line in the left of the chromosome bar. Markers showing significant distorted 



segregation ratios are indicated with asterisks means significant at 0.01 p-value, **** at 0.005, 



^ at 0.001, 



' at 0.0005 and 



at 



0.0001). Annotations of SNPs are indicated by the term GO and a numeric code. Numeric codes for molecular function annotation level 2: 1 - 
binding; 2 - catalytic activity; 3 - structural molecule activity; 4 - transporter activity; 5 - enzyme regulator activity. Numeric codes for molecular 
function level 3: 1.1 - nucleic acid binding; 1.2 - nucleotide binding; 1.3 - protein binding; 1.4 - carbohydrate binding; 1.5 - lipid binding; 2.1 
hydrolase activity; 2.2 - transferase activity: 5.1 - sequence specific DNA binding; 6.1 - signal transducer activity. 



These results are very similar to those obtained in other 
pine [92,93] and conifer species [36]. The number of dis- 
torted markers excluded and unlinked was similar to the 
number of distorted markers finally assigned to LGs 
(Additional file 4). Besides, among the unlinked and 
excluded loci the number of distorted markers was not 
higher than those showing no segregation distortion 
(Table 1). Thus, in this case, unlinked and excluded loci 
seem not to be the result of segregation distortion, as previ- 
ously reported in other linkage studies [24]. Distorted mar- 
kers assigned to a LG were randomly distributed 
(Additional file 4). Only 13 distorted markers could be posi- 
tioned in each map indicating the difficulty to estimate an 
accurate position for these distorted markers. Distorted 
markers positioned in the maps did not appear clustered in 
specific regions of the genome (Table 1, Figure 1 and 
Figure 2) suggesting that segregation distortion was prob- 
ably related with genotyping errors rather than the effect 



of pre or post-zygotic selection. As they were not clus- 
tered they did not compromise map structure [93]. 

Marlcer distribution 

Markers were randomly distributed along the genome as 
no significant differences were found between distribution 
of markers along the LGs and expected distribution under 
the hypothesis of randomness (Kolmogorov-Smirnov for 
two populations, D = 0.55, p-value = 0.124 for C14 map and 
D = 0.5, p-value = 0.474 for C15 map), in accordance to 
other conifer maps [24,32]. Besides, largest LG had more 
SNPs (Pearson correlation, r =0.53, p-value = 0.01 for C14 
map and r =0.62, p-value = 0.003 for CI 5 map) than smaller 
LG. Same results were obtained for SAMPLs (Pearson cor- 
relation, r =0.69, p-value < 0.001 for C14 map and r =0.75, 
p-value < 0.001 for CI 5 map) indicating that they are also 
randomly distributed along the genome, as expected for 
this kind of multiband markers [28]. 
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Figure 2 Genetic linlcage maps: LGs 7 to 15. Bars on the left represent the LGs obtained for CI 4 and the bars on the right the LGs obtained 
for CI 5. Common marl<ers between both maps are in bold and connected with a solid line. Markers in italics are in common with maps of 
Chancerel et al. [44] and the homologous LG in this study is indicated with brackets. Markers showing any special feature (see Results section) are 
underlined. Markers in color are candidate genes that co-localize with QTLs reported in previously published maps for wood properties (green), 
isotopic composition of C^^ (violet) and ring growth (blue). SNPs belonging to the same contig are surrounded by a solid line and when they 
were too far from each other they are connected by a solid line in the left of the chromosome bar. Markers showing significant distorted 
segregation ratios are indicated with asterisks C''''' means significant at 0.01 p-value, at 0.005, at 0.001, ****** at 0.0005 and ******* at 
0.0001). Annotations of SNPs are indicated by the term GO and a numeric code. Numeric codes for molecular function annotation level 2: 1 - 
binding; 2 - catalytic activity; 3 - structural molecule activity; 4 - transporter activity; 5 - enzyme regulator activity. Numeric codes for molecular 
function level 3: 1.1 - nucleic acid binding; 1.2 - nucleotide binding; 1.3 - protein binding; 1.4 - carbohydrate binding; 1.5 - lipid binding; 2.1 - 
hydrolase activity; 2.2 - transferase activity: 5.1 - sequence specific DNA binding; 6.1 - signal transducer activity. 



Table 2 Markers used for comparative mapping within 



the species 



Marker 


C14 


C15 


Common markers between both parental maps 




33 


Markers segregating in both parents 
positioned only in one parental map 


5 


6 


Common markers with Chancerel et al. [44] 


65 


57 


Common SSR loci 


2 


3 


Common ESTP loci 


7 


3 


Common SNP loci 


56 


51 


Number of LGs without common 
markers with Chancerel et al. [44] 


1 


2 



Genome length and map coverage 

Observed genome length ranged from 1,180.4 (C14) to 
1,379.5 cM (CIS), 200 cM larger for CIS map than 
for C14 map (Table 1). In other P. pinaster maps 
observed genome length ranged from 869 to 1,860 cM 
depending on the density of markers [44,SS,S8]. The 
higher genome length observed in CIS map agrees 
with its higher heterozygosity estimation compared to 
C14 map (Table 3). Estimated genome length ranged 
from 1,870.2 to 2,166.6 cM, in line with what has been 
obtained in previous P, pinaster maps (1,223 to 
3,2S2 cM depending on the method of estimation 
[44,S6,88]). The last generation maps estimated P, pin- 
aster genome size to be 2,S00 cM [44], a value which 
is near our estimates and close to other pine species 
[94]. Estimated genome length was higher for CIS 



de Miguel et al. BMC Genomics 2012, 13:527 
http://www.bionnedcentral.conn/1471 -21 64/1 3/527 



Page 10 of 14 



Table 3 Heterozygosity 

Marker CI 4 CIS 

Poly. Mono. Heterozygosity (%) Poly. Mono. Heterozygosity (%) 

SSR 9 31 0.23 10 30 0.25 

SAMPL 133 191 0.41 251 191 0.57 

SNP 150 672 0.18 96 726 0.12 

Percentage of heterozygosity calculated as the ratio of polymorphic markers segregating in each parental map over total (polymorphic -Poly.- and monomorphic- 
Mono.-) markers. 

Heterozygosity estimates for SAMPLs were calculated for the first primer combination tested in the family. 



linkage map although the observed map coverage (near 
65%, Table 1) was similar for both parental maps. High 
density genetic linkage maps usually report map cover- 
age over 90% [46,95]. However, previously published P. 
pinaster genetic linkage maps also reported map cover- 
age near 65% [44,57], indicating the difficulty to 
achieve a complete coverage for such a large and com- 
plex genome. 

Functional annotation 

We validated and improved the functional annotation 
information for mapped SNPs. Significant sequence 
homology was found in pine Gene Index database for 
160 out of the 171 mapped SNPs in both parental link- 
age maps [83] (Additional file 5). Sequence homology 
was found for several species with the top hit homolo- 
gies for Picea sitchensis, Pinus taeda, Pinus radiata, Vitis 
vinifera and Picea glauca (Additional file 5). Nine 
sequences over the 171 sequences showed no match 
with the InterPro database [96] and we did not find any 
GO term for seven sequences. Thus, a total of 144 
sequences were annotated, 132 of them for molecular 
function and 101 sequences with annotation for 
molecular function levels 2 or 3. Most of the mapped 



SNPs were associated to cDNA belonging to the GO 
terms: binding, catalytic activity and hydrolase activity 
(Additional file 6). As expected, SNPs belonging to the 
same contig reported identical GO annotation terms. 
However, our results could not confirm statistically (Kol- 
mogorov-Smirnov test for two populations not signifi- 
cant, data not shown) if neighboring SNPs belonging to 
different genes or contigs exhibited GO terms for the 
same molecular function. Denser genetic maps with dee- 
per functional annotation are required to evaluate if 
genes with similar functions are clustered or not. 

The comparison of our annotated linkage maps and 
linkage maps reporting QTL information revealed candi- 
date genes for several QTLs for wood properties or iso- 
topic composition of C^^ (5C^^) [61,62,64]. 6C^^ is a 
character closely related with water use efficiency [97]. 
In our study, SNPs annotated for water-stress inducible 
proteins, AQUAPORINs and DEHYDRINs were posi- 
tioned in the same region as QTLs for 5C^^ [62] 
(Table 4). This outcome reinforces the hypothesis that 
the genomic regions identified by QTL analysis [62] 
might play a key role in the genetic control of water use 
efficiency. Also SNPs associated with a CELLULOSE 
SYNTHASE CESA3, a PEROXIDASE (enzyme involved 



Table 4 Co-localizations of SNPs and QTLs 



SNP ID 


Sequence description 


e-value 


LG 


Trait QTL 


Reference 


ml57 


CELLULOSE SYNTHASE 


1.5e^^ 


3 


wood 


Pot et al. [64] 


m264 


PEROXIDASE 


2_9e-147 


8 


wood 




m941 


ENDO- BETA-XYLANASE A-LIKE 


0 


4 


wood 




ml 542 


WATER-STRESS INDUCIBLE PROTEIN 1 




2 


6C^' 


Brendel et al. [62] 


ml 543 


WATER-STRESS INDUCIBLE PROTEIN 3 


73e-34 


2 


6C^' 




m426 


WATER DEFICIT INDUCIBLE LP3-LIKE PROTEIN 


3_2e-62 


3 


6C^' 




m295 


AQUAPORIN 


9.4"-' 


5 


6C^' 




m965 


THYLAKOID LUMENAL 19 KDA PROTEIN 


-,_^e-119 


5 


ring growth 




m712 


DEHYDRIN 9 PROTEIN 


1.2"-^^ 


8 


6C" 




m716 


DEHYDRIN 2 




8 


6C" 




m859 


THYLAKOID LUMENAL PROTEIN 


9.8-^45 


12 


6C" 





Co-localization of mapped SNPs with QTLs detected in previously published maps of P.pinaster. 6C^^ stands for isotopic composition of C^^. Wood stands for wood 
chemical composition and fibre properties. 
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in lignin polymerization [98]) and a ENDO- 14BETA- 
XYLANASE A-LIKE gene (Additional file 5) co-localized 
with QTLs for wood chemical composition and fiber 
properties [64] (Table 4). This result increases the evi- 
dence of function assigned to these genes and has spe- 
cial relevance when we consider that orthologous QTLs 
for wood properties were also found in other Pinus spe- 
cies [61]. This finding highlights the importance of 
developing functional genetic linkage maps to be used as 
useful tools to look for favorable allelic variants to be 
implemented in MAS. 

Conclusions 

Our study demonstrates the importance of developing 
genetic linkage maps from different populations repre- 
senting different genetic backgrounds in order to gener- 
ate an accurate consensus linkage map of the same 
species. Comparative mapping is a key process to facili- 
tate the understanding of genome organization and evo- 
lution in conifers. For that purpose it is essential to 
correctly identify orthologous versus paralogous genes. 
New efforts in detecting orthologous markers as well as 
progress in sequencing conifer genomes will improve 
comparative mapping studies in the future. Here we also 
confirm the importance of developing functional genetic 
linkage maps, especially when working with breeding 
populations for its future application in MAS for traits 
of interest. 
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